Strenkert D, Schmollinger S, Gallaher SD, et al. Multiomics resolution of molecular events during a day in the life of Chlamydomonas. Proc Natl Acad Sci U S A. 2019;116(6):2374-2383.doi:10.1073/pnas.1815238116
my RNAseq workflow
After treating the raw RNAseq data with my rna_workflow.smk, I obtained a counts table with each RNAseq library, following the time series order.
In this R markdown report, I perform a short analysis of this data and extract information on some potential or known Organellar Trans-Acting Factors (OTAFs) of Chlamydomonas reinhardtii.
## Loading counts data and metadata
meta_temp <- read.csv("../data/sample_DiscoRythm_format.tsv", sep="\t")
counts_rund_df <- read.csv("../results/mapping/full_data/total_counts_int.csv", sep="\t", row.names = 1)
## Picking colors for timepoints
meta_temp$color <- c(rep("#000f3b", 3), rep("#00144e", 3), rep("#001962", 3),
rep("#001162", 3), rep("#001789", 3), rep("#001eb1", 3),
rep("#143bff", 3), rep("#86bcf9", 3), rep("#ffc576", 3),
rep("#ffd400", 3), rep("#ffe900", 3), rep("#fff04e", 3),
rep("#fff589", 3), rep("#cec031", 3), rep("#ffbf00", 3),
rep("#001162", 3))
## Loading annotation data
annotations <- read.csv("../data/phytozome/Creinhardtii/v5.6/annotation/gene_annotation.tsv",
sep="\t")
annotations_organelles <- read.csv("../data/phytozome/Creinhardtii/v5.6/annotation/organelles_annotation.tsv",
sep="\t")
# /!\ Removing duplicated genes
annotations$gene_id <- make.unique(annotations$gene_id, sep=".")
## Matrix conversion
matrix_counts_DESeq2 <- as.matrix(counts_rund_df[,2:49])
rownames(matrix_counts_DESeq2) <- counts_rund_df[,1]DESeq2:
Love MI, Huber W, Anders S (2014). “Moderated estimation of fold change and dispersion for RNA seq data with DESeq2.” Genome Biology, 15, 550. doi: 10.1186/s13059-014-0550-8
#### Normalisation and transformation of count data with DESeq2 ####
ddsFull <- DESeqDataSetFromMatrix(countData = matrix_counts_DESeq2,
colData = meta_temp,
design = ~ time)
## Size factors
ddsFull <- estimateSizeFactors(ddsFull)
## Taking out unexpressed genes
ddsFull <- ddsFull[ rowSums(counts(ddsFull)) > 0, ]
## rlog transformation
rlog_data <- rlog(ddsFull, blind=TRUE)
boxplot(counts(ddsFull, normalized=FALSE),
col = meta_temp$color,
horizontal = TRUE,
las = 1,
cex.axis = 0.45,
cex = 0.5,
xlab = "raw value",
main="Boxplots of untransformed
gene counts")boxplot(assay(rlog_data),
col = meta_temp$color,
horizontal = TRUE,
las = 1,
cex.axis = 0.45,
cex = 0.5,
xlab = "rlog(value)",
main="Boxplots of rlog transformed
gene counts")## Transformation visualisation
par(mfrow=c(1,2))
plot(counts(ddsFull, normalized=FALSE)[,11:12],
pch=16, cex=0.3, xlim=c(0,20e3), ylim=c(0,20e3), main="raw counts")
plot(assay(rlog_data)[,11:12],
pch=16, cex=0.3, main="rlog normalized counts")DESeq2 normalises the data according to library size (size factors) then uses a rlog transformation to diminish the impact of very low or very high expression genes on the subsequent analyses.
## rlog transformed data, PCA
## extract rlog matrix
counts_rlog <- assay(rlog_data)
## Run PCA
res_pca <- PCA(t(counts_rlog), graph=FALSE)
## Graphs
fviz_eig(res_pca, title="PCA of rlog data, Eigen values", addlabels = TRUE, ylim = c(0, 50))PCA on rlog transformed count data.
fviz_pca_ind(res_pca, label="none",
title="PCA of rlog data",
habillage=as.factor(meta_temp$time),
mean.point=FALSE,
pointshape = 19) PCA on rlog transformed count data.
On this Principal Component Analysis the cyclical nature of the experiment is apparent. From the time point 0h, the time points descend counter-clock wise, to return at the final 24h time point at the level of the first one. Note how time points at the end of the night (6h to 11h) are more closely related then the others, this is because Chlamydomonas cells are relatively quiescent at this part of their day cycle.
dists <- dist(t(assay(rlog_data)))
par(mfrow=c(1,1))
tree_rlog <- hclust(dists)
my_colors = c("#ffc576", "#445cd7", "#fff04e", "#001789", "#faf5c6")
par(bg = "darkgrey", mfrow=c(1, 1))
plotColoredClusters(tree_rlog, labs = meta_temp$time,
ylab = NA, xlab = NA, cex = , las = 1,
cols = meta_temp$color, col = "white",
main = "Samples Euclidian distance hierarchical
clustering, rlog, complete linkage")
rect.hclust(tree_rlog, k=5, border=my_colors)Hierarchical clustering of rlog transformed count data.
par(bg="white")
# Saving clusters
clusters_names <- c("early_night",
"night",
"dawn",
"day",
"dusk")
clusters_rlog <- cutree(tree_rlog, k=5)
meta_temp$cluster <- clusters_names[clusters_rlog]The hierarchical clustering of the transformed data gives coherent clusters, grouping samples according to time periods. In this report I decided to use these clusters to separate the samples in the differential expression analysis.
Here I use the previously defined clusters as formula to look for differentially expressed genes with DESeq2. However this can also be achieved by using directly the time conditions (as the R script in the workflow does).
#### DESeq2 differential expression analysis by time point ####
ddsTimes <- DESeqDataSetFromMatrix(countData = matrix_counts_DESeq2,
colData = meta_temp,
design = ~ cluster) # ~ time_factor could be used instead
## Elimination of lowly expressed genes
ddsTimes <- ddsTimes[ rowSums(counts(ddsTimes)) > 60, ]
## Differential expression analysis
ddsTimes <- DESeq(ddsTimes)
time_points <- unique(meta_temp$cluster) # time_factor could be used instead
time_rep <- c(rep(time_points, c(1, rep(2, length(time_points)-1))), time_points[1])
pairs <- matrix(as.factor(time_rep), ncol = 2, byrow = TRUE)
# Significant genes initialisation
significant_genes <- c()
# Extraction of DESeq2 results
for (i in (1 : nrow(pairs))) {
ele <- pairs[i,]
text <- paste0(ele[1], "/", ele[2], " DEG:")
resDESeq <- results(ddsTimes, contrast = c("cluster", ele[1], ele[2]), # "time_factor" could be used instead
independentFiltering = TRUE, alpha=0.01)
message(text)
message( sum( resDESeq$padj < 0.01, na.rm=TRUE ) )
temp_list <- row.names(resDESeq[which(resDESeq$padj < 0.01),])
message(paste0(c(length(temp_list)/nrow(counts_rund_df) * 100), " %"))
significant_genes <- union(temp_list, significant_genes)
}
print(paste0("Total DEG in at least one time transition: ", (length(significant_genes)/nrow(counts_rund_df) * 100), " %"))## [1] "Total DEG in at least one time transition: 85.614094148011 %"
sign_df <- data.frame(row.names = significant_genes)
sign_df$diff_expr <- "yes"
sign_df$gene_id <- row.names(sign_df)The use of time condition (as in the workflow R script) instead of the clusters gives less DEG, about 70%.
DiscoRhythm:
Matthew Carlucci, Algimantas Kriščiūnas, Haohan Li, Povilas Gibas, Karolis Koncevičius, Art Petronis, Gabriel Oh, DiscoRhythm: an easy-to-use web application and R package for discovering rhythmicity, Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1952–1954, https://doi.org/10.1093/bioinformatics/btz834
# Converting rlog matrix in SE (Submarised Experiment))
gene_id <- row.names(counts_rlog)
rlog_df <- as.data.frame(counts_rlog)
rlog_df <- add_column(rlog_df, gene_id, .before=1)
input_data <- discoDFtoSE(rlog_df)
discoDesignSummary(input_data)## 0 2
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 4 6
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 8 10
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 10.5 11
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 11.5 12
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 14 16
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 18 20
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
## 22 24
## Total "3" "3"
## "Biological Sample 1 (1)" "Biological Sample 1 (1)"
## "Biological Sample 2 (1)" "Biological Sample 2 (1)"
## "Biological Sample 3 (1)" "Biological Sample 3 (1)"
# Our timepoints are not equidistant, we can use either the cosinor or Lomb-Scargle method to detect feature rhythms
# Cosinor
rythms_genes_CS <- discoODAs(input_data, period = 24, method = "CS",
circular_t = TRUE, ncores = 4)
# Lomb-Scargle
rythms_genes_LS <- discoODAs(input_data, period = 24, method = "LS",
circular_t = TRUE, ncores = 4)
par(mfrow=c(1,2))
hist(data.frame(rythms_genes_CS)$CS.qvalue, breaks = 100,
main = "qvalue cosinor",
xlab = "qvalue")
hist(data.frame(rythms_genes_LS)$LS.qvalue, breaks = 100,
main = "qvalue Lomb-Scargle",
xlab = "qvalue")The Lomb-Scargle qvalue distribution has a problematic profile, we will focus on Cosinor.
# Extracting just the dataframes from the discorythm output
CS_df <- data.frame(rythms_genes_CS)
# Retreiving only most significant genes
# qvalue inferior to 0.05
CS_df <- CS_df[CS_df$CS.qvalue < 0.05,]
par(mfrow=c(1,2))
hist(CS_df$CS.amplitude, breaks = 100,
main = "amplitudes cosinor
after selection",
xlab = "amplitude")
hist(CS_df$CS.acrophase, breaks = 100,
main = "acrophases cosinor
after selection",
xlab = "hour")The acrophase is the time when the expression of the gene peaks. We will focus on the acrophase parameter instead of the amplitude of the expression signal to find putative regulators. Indeed, what we are interested in is not the intensity of gene expression, but rather the timing of expression.
## Extraction of annotations
row_CS <- row.names(CS_df)
CS_df$gene_id <- row_CS
CS_df <- left_join(CS_df, annotations)
id_organelle = annotations_organelles$gene_id
row.names(annotations_organelles) <- id_organelle
row_CS -> row.names(CS_df)
for (id in id_organelle) {
CS_df[id, "gene_symbol"] <- annotations_organelles[id,"gene_symbol"]
}
CS_df$gene_id <- row.names(CS_df)
CS_df$encoded <- "Nucleus"
CS_df[which(startsWith(CS_df$gene_id, "CreMt.")), "encoded"] <- "Mitochondrion"
CS_df[which(startsWith(CS_df$gene_id, "CreCp.")), "encoded"] <- "Chloroplast"
CS_df[which(startsWith(CS_df$gene_id, "CreMt.")), "subcellular_location"] <- "Mitochondrion"
CS_df[which(startsWith(CS_df$gene_id, "CreCp.")), "subcellular_location"] <- "Chloroplast"
CS_df[which(startsWith(CS_df$gene_id, "CreMt.")), "simplified_subcellular_location"] <- "Mitochondrion"
CS_df[which(startsWith(CS_df$gene_id, "CreCp.")), "simplified_subcellular_location"] <- "Chloroplast"
CS_df[which(startsWith(CS_df$gene_description, "OctotricoPeptide") == TRUE), ] -> CS_subset_opr
CS_df[which(startsWith(CS_df$gene_description, "PentatricoPeptide") == TRUE), ] -> CS_subset_ppr
# Genes both rhythmic and DE
final_data <- left_join(CS_df, sign_df, by = "gene_id")
final_data <- final_data[which(final_data$diff_expr == "yes"),]
final_data <- final_data[ which(!is.na(final_data$CS.acrophase)) , ]ggplot(data = final_data, mapping = aes(x=CS.acrophase, fill=encoded))+
geom_histogram(aes(x = CS.acrophase, y = ..density..), binwidth = 0.5) +
facet_wrap(~encoded) +
labs( x = "Acrophase", y = "Genes",
title ="Acrophases distribution according to encoding genome",
subtitle = "Cosinor method")+
scale_fill_manual(values = c("#99e55c", "#eb957c", "#0fc6e1"))Acrophases distribution.
c(paste0("Chloroplast", " (", nrow(final_data[final_data$simplified_subcellular_location == "Chloroplast",]), ")"),
paste0("Chromosome", " (", nrow(final_data[final_data$simplified_subcellular_location == "Chromosome",]), ")"),
paste0("Cilium", " (", nrow(final_data[final_data$simplified_subcellular_location == "Cilium",]), ")"),
paste0("Cytoplasm", " (", nrow(final_data[final_data$simplified_subcellular_location == "Cytoplasm",]), ")"),
paste0("Cytoskeleton", " (", nrow(final_data[final_data$simplified_subcellular_location == "Cytoskeleton",]), ")"),
paste0("Endoplasmic reticulum", " (", nrow(final_data[final_data$simplified_subcellular_location == "Endoplasmic reticulum",]), ")"),
paste0("Golgi apparatus", " (", nrow(final_data[final_data$simplified_subcellular_location == "Golgi apparatus",]), ")"),
paste0("Membrane", " (", nrow(final_data[final_data$simplified_subcellular_location == "Membrane",]), ")"),
paste0("Mitochondrion", " (", nrow(final_data[final_data$simplified_subcellular_location == "Mitochondrion",]), ")"),
paste0("Nucleus", " (", nrow(final_data[final_data$simplified_subcellular_location == "Nucleus",]), ")"),
paste0("Other", " (", nrow(final_data[final_data$simplified_subcellular_location == "Other",]), ")"),
paste0("unknown", " (", nrow(final_data[final_data$simplified_subcellular_location == "unknown",]), ")")) -> locations
#final_data2 <- filter(final_data, simplified_subcellular_location != "unknown")
ggplot(data = final_data, mapping = aes(x=CS.acrophase, fill=simplified_subcellular_location))+
annotate("rect", fill = "darkgray", xmin = 0, xmax = 11, ymin =0, ymax = Inf, alpha =0.3) +
annotate("rect", fill = "gold", xmin = 11, xmax = 23, ymin =0, ymax = Inf, alpha =0.3) +
annotate("rect", fill = "darkgray", xmin = 23, xmax = Inf, ymin =0, ymax = Inf, alpha =0.3)+
geom_histogram(aes(x = CS.acrophase, y = ..density..), binwidth = 0.3) +
geom_line(stat = "density", alpha=0.5) +
facet_wrap(~simplified_subcellular_location)+
labs( x = "Acrophase", y = "Genes",
title ="Acrophases distribution according to cellular location",
subtitle = "Cosinor method")+
scale_fill_discrete(name = "Cellular location", labels = locations)Acrophases distribution.
# Recovering some rhythmic OTAFs
opr <- final_data[ which(startsWith(final_data$gene_description, "OctotricoPeptide Repeat")) , ]
ppr <- final_data[ which(startsWith(final_data$gene_description, "PentatricoPeptide Repeat")) , ]
tpr <- final_data[ which(startsWith(final_data$gene_description, "TetratricoPeptide Repeat")) , ]
otaf <- union_all(opr, ppr)
otaf <- union_all(otaf, tpr)
# Adding expression data:
columns <- colnames(otaf)
otaf <- left_join(otaf, rlog_df)
colnames(otaf) <- make.unique(c(columns, rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)))
# Predicted chloroplast imported OTAFs
multi_otaf_chloro <- data.frame(time=0, gene_expression=0.0, gene="")
for (prot in otaf[which(otaf$subcellular_location == "Chloroplast"),]$gene_symbol){
otaf[otaf$gene_symbol == prot, 19:66] %>%
pivot_longer(cols = c(1:48), names_to = "time", values_to = "gene_expression") -> tmp_df
tmp_df$time <- rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)
tmp_df$gene <- prot
multi_otaf_chloro <- rbind(multi_otaf_chloro, tmp_df)
}
multi_otaf_chloro <- multi_otaf_chloro[-1,]
# Predicted mitochondrion imported OTAFs
multi_otaf_mito <- data.frame(time=0, gene_expression=0.0, gene="")
for (prot in otaf[which(otaf$subcellular_location == "Mitochondrion"),]$gene_symbol){
otaf[otaf$gene_symbol == prot, 19:66] %>%
pivot_longer(cols = c(1:48), names_to = "time", values_to = "gene_expression") -> tmp_df
tmp_df$time <- rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)
tmp_df$gene <- prot
multi_otaf_mito <- rbind(multi_otaf_mito, tmp_df)
}
multi_otaf_mito <- multi_otaf_mito[-1,]# Extracting organellar rhythmic genes
chloro <- final_data[ which(startsWith(final_data$gene_id, "CreCp")) , ]
mito <- final_data[ which(startsWith(final_data$gene_id, "CreMt")) , ]
# Adding expression data:
columns <- colnames(chloro)
chloro <- left_join(chloro, rlog_df)
colnames(chloro) <- make.unique(c(columns, rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)))
chloro$gene_symbol <- make.unique(chloro$gene_symbol)
columns <- colnames(mito)
mito <- left_join(mito, rlog_df)
colnames(mito) <- make.unique(c(columns, rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)))
# Chloroplast
multi_chloro <- data.frame(time=0, gene_expression=0.0, gene="")
for (prot in chloro$gene_symbol){
chloro[chloro$gene_symbol == prot, 19:66] %>%
pivot_longer(cols = c(1:48), names_to = "time", values_to = "gene_expression") -> tmp_df
tmp_df$time <- rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)
tmp_df$gene <- prot
multi_chloro <- rbind(multi_chloro, tmp_df)
}
multi_chloro <- multi_chloro[-1,]
# Mitochondrion
multi_mito <- data.frame(time=0, gene_expression=0.0, gene="")
for (prot in mito$gene_symbol){
mito[mito$gene_symbol == prot, 19:66] %>%
pivot_longer(cols = c(1:48), names_to = "time", values_to = "gene_expression") -> tmp_df
tmp_df$time <- rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)
tmp_df$gene <- prot
multi_mito <- rbind(multi_mito, tmp_df)
}
multi_mito <- multi_mito[-1,]periode=24
ggplot(data = multi_otaf_chloro, mapping = aes(x=time, y=gene_expression, group=gene, color=gene)) +
geom_smooth(method = "lm", se = FALSE, level = 0.95,
formula = y ~ sin(x / periode * 2 * pi) + cos(x / periode * 2 * pi),
fullrange = TRUE) +
theme(legend.position = "none") +
geom_point() +
labs(y = "rlog(counts)", x = "Time (h)",
title ="Rhythmic chloroplast localised OTAFs, expression models:",
subtitle = "Cosinor method")+
theme(plot.title = element_text(size = 10))Gene expression models.
ggplot(data = multi_chloro, mapping = aes(x=time, y=gene_expression, group=gene, color=gene)) +
geom_smooth(method = "lm", se = FALSE, level = 0.95,
formula = y ~ sin(x / periode * 2 * pi) + cos(x / periode * 2 * pi),
fullrange = TRUE) +
theme(legend.position = "none") +
geom_point() +
labs(y = "rlog(counts)", x = "Time (h)",
title ="Rhythmic chloroplast genes, expression models:",
subtitle = "Cosinor method")+
theme(plot.title = element_text(size = 10))Gene expression models.
ggplot(data = multi_otaf_mito, mapping = aes(x=time, y=gene_expression, group=gene, color=gene)) +
geom_smooth(method = "lm", se = FALSE, level = 0.95,
formula = y ~ sin(x / periode * 2 * pi) + cos(x / periode * 2 * pi),
fullrange = TRUE) +
theme(legend.position = "none") +
geom_point() +
labs(y = "rlog(counts)", x = "Time (h)",
title ="Rhythmic mitochondrion localised OTAFs, expression models:",
subtitle = "Cosinor method")+
theme(plot.title = element_text(size = 10))Gene expression models.
ggplot(data = multi_mito, mapping = aes(x=time, y=gene_expression, group=gene, color=gene)) +
geom_smooth(method = "lm", se = FALSE, level = 0.95,
formula = y ~ sin(x / periode * 2 * pi) + cos(x / periode * 2 * pi),
fullrange = TRUE) +
theme(legend.position = "none") +
geom_point() +
labs(y = "rlog(counts)", x = "Time (h)",
title ="Rhythmic mitochondrion genes, expression models:",
subtitle = "Cosinor method")+
theme(plot.title = element_text(size = 10))Gene expression models.
To pair OTAFs and their putative organellar mRNA targets we must look at:
OTAFs and mRNA localised in the same organelle
With acrophases offset by a few hours. From mRNA the OTAF must be translated, then imported into the organelle where it might act on its mRNA target. Here I used a 4 hours shifted potential window.
acro_day_chloro <- data.frame(time=0, gene_expression=0.0, gene="")
for (prot in chloro[which(chloro$CS.acrophase <15 & chloro$CS.acrophase >12 ),]$gene_symbol) {
chloro[chloro$gene_symbol == prot, 19:66] %>%
pivot_longer(cols = c(1:48), names_to = "time", values_to = "gene_expression") -> tmp_df
tmp_df$time <- rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)
tmp_df$gene <- prot
acro_day_chloro <- rbind(acro_day_chloro, tmp_df)
}
acro_day_chloro <- acro_day_chloro[-1,]
otaf_chloro_acro_day <- data.frame(time=0, gene_expression=0.0, gene="")
for (prot in otaf[which(otaf$subcellular_location == "Chloroplast" & otaf$CS.acrophase <11 & otaf$CS.acrophase >8),]$gene_symbol){
otaf[otaf$gene_symbol == prot, 19:66] %>%
pivot_longer(cols = c(1:48), names_to = "time", values_to = "gene_expression") -> tmp_df
tmp_df$time <- rep(c(0, 2, 4, 6, 8, 10, 10.5, 11, 11.5, 12, 14, 16, 18, 20, 22, 24), each=3)
tmp_df$gene <- prot
otaf_chloro_acro_day <- rbind(otaf_chloro_acro_day, tmp_df)
}
otaf_chloro_acro_day <- otaf_chloro_acro_day[-1,]
# Plots
ggplot(data = otaf_chloro_acro_day, mapping = aes(x=time, y=gene_expression, group=gene, color=gene)) +
annotate("rect", fill = "darkgray", xmin = 0, xmax = 11, ymin =6, ymax = 13, alpha =0.3) +
annotate("rect", fill = "gold", xmin = 11, xmax = 23, ymin =6, ymax = 13, alpha =0.3) +
annotate("rect", fill = "darkgray", xmin = 23, xmax = Inf, ymin =6, ymax = 13, alpha =0.3)+
facet_wrap(~gene) +
ylim(6, 13) +
geom_smooth(method = "lm", se = FALSE, level = 0.95,
formula = y ~ sin(x / periode * 2 * pi) + cos(x / periode * 2 * pi),
fullrange = TRUE) +
theme(legend.position = "none") +
geom_point() +
labs(y = "rlog(counts)", x = "Time (h)",
title ="Chloroplast located OTAFs, acrophase at end of night",
subtitle = "Cosinor method") +
theme(plot.title = element_text(size = 11))+
theme(strip.text.x = element_text(size = 11))Candidate OTAF/mRNA target pairs.
ggplot(data = acro_day_chloro, mapping = aes(x=time, y=gene_expression, group=gene, color=gene)) +
ylim(10, 21) +
annotate("rect", fill = "darkgray", xmin = 0, xmax = 11, ymin =10, ymax = 21, alpha =0.3) +
annotate("rect", fill = "gold", xmin = 11, xmax = 23, ymin =10, ymax = 21, alpha =0.3) +
annotate("rect", fill = "darkgray", xmin = 23, xmax = Inf, ymin =10, ymax = 21, alpha =0.3)+
facet_wrap(~gene) +
geom_smooth(method = "lm", se = FALSE, level = 0.95,
formula = y ~ sin(x / periode * 2 * pi) + cos(x / periode * 2 * pi),
fullrange = TRUE) +
theme(legend.position = "none") +
geom_point() +
labs(y = "rlog(counts)", x = "Time (h)",
title ="Chloroplast mRNA, acrophase at dawn",
subtitle = "Cosinor method")+
theme(plot.title = element_text(size = 11))+
theme(strip.text.x = element_text(size = 11))Candidate OTAF/mRNA target pairs.
Here, we can identify known OTAF/chloroplast mRNA pairs/trios:
MDA1, TDA1 and atpA
MDH1 (MTHI1), atpH and atpI
CCS2 and ccsA
MCG1 and petG
RAA1, RAA8 and psbA
But also other potential OTAF/mRNA pairs!
Eberhard S, Loiselay C, Drapier D, Bujaldon S, Girard-Bascou J, Kuras R, Choquet Y, Wollman FA. Dual functions of the nucleus-encoded factor TDA1 in trapping and translation activation of atpA transcripts in Chlamydomonas reinhardtii chloroplasts. Plant J. 2011 Sep;67(6):1055-66. doi: 10.1111/j.1365-313X.2011.04657.x. Epub 2011 Jul 18. PMID: 21623973.
Wang F, Johnson X, Cavaiuolo M, Bohne AV, Nickelsen J, Vallon O. Two Chlamydomonas OPR proteins stabilize chloroplast mRNAs encoding small subunits of photosystem II and cytochrome b6 f. Plant J. 2015 Jun;82(5):861-73. doi: 10.1111/tpj.12858. PMID: 25898982.
Cline, S. G., Laughbaum, I. A. and Hamel, P. P. (2017) CCS2, an Octatricopeptide-Repeat Protein, Is Required for Plastid Cytochrome c Assembly in the Green Alga Chlamydomonas reinhardtii. Frontiers in plant science, 8, 1306. https://doi.org/10.3389/fpls.2017.01306
Viola S, Cavaiuolo M, Drapier D, et al. MDA1, a nucleus-encoded factor involved in the stabilization and processing of the atpA transcript in the chloroplast of Chlamydomonas. The Plant Journal: for Cell and Molecular Biology. 2019 Jun;98(6):1033-1047. DOI: 10.1111/tpj.14300.
Shin-Ichiro Ozawa, Marina Cavaiuolo, Domitille Jarrige, Richard Kuras, Mark Rutgers, Stephan Eberhard, Dominique Drapier, Francis-André Wollman, Yves Choquet, The OPR Protein MTHI1 Controls the Expression of Two Different Subunits of ATP Synthase CFo in Chlamydomonas reinhardtii, The Plant Cell, Volume 32, Issue 4, April 2020, Pages 1179–1203, https://doi.org/10.1105/tpc.19.00770
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.0.3/lib/libopenblasp-r0.3.10.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] gprofiler2_0.2.0 DiscoRhythm_1.6.0
## [3] pheatmap_1.0.12 ClassDiscovery_3.3.13
## [5] oompaBase_3.2.9 cluster_2.1.1
## [7] DESeq2_1.30.1 SummarizedExperiment_1.20.0
## [9] Biobase_2.50.0 GenomicRanges_1.42.0
## [11] GenomeInfoDb_1.26.7 IRanges_2.24.1
## [13] S4Vectors_0.28.1 BiocGenerics_0.36.1
## [15] MatrixGenerics_1.2.1 matrixStats_0.59.0
## [17] data.table_1.14.0 forcats_0.5.1
## [19] stringr_1.4.0 dplyr_1.0.6
## [21] purrr_0.3.4 readr_1.4.0
## [23] tidyr_1.1.3 tibble_3.1.2
## [25] tidyverse_1.3.0 knitr_1.33
## [27] factoextra_1.0.7 ggplot2_3.3.4
## [29] FactoMineR_2.4
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.1 shinydashboard_0.7.1 tidyselect_1.1.1
## [4] heatmaply_1.2.1 RSQLite_2.2.7 AnnotationDbi_1.52.0
## [7] htmlwidgets_1.5.3 grid_4.0.3 TSP_1.1-10
## [10] BiocParallel_1.24.1 munsell_0.5.0 codetools_0.2-18
## [13] DT_0.17 miniUI_0.1.1.1 withr_2.4.2
## [16] colorspace_2.0-1 highr_0.9 gnm_1.1-1
## [19] rstudioapi_0.13 leaps_3.1 oompaData_3.1.1
## [22] ggsignif_0.6.1 labeling_0.4.2 GenomeInfoDbData_1.2.4
## [25] farver_2.1.0 bit64_4.0.5 vctrs_0.3.8
## [28] generics_0.1.0 lambda.r_1.2.4 xfun_0.23
## [31] R6_2.5.0 seriation_1.2-9 locfit_1.5-9.4
## [34] bitops_1.0-7 cachem_1.0.5 DelayedArray_0.16.3
## [37] assertthat_0.2.1 promises_1.2.0.1 shinycssloaders_1.0.0
## [40] scales_1.1.1 nnet_7.3-15 ggExtra_0.9
## [43] gtable_0.3.0 MetaCycle_1.2.0 rlang_0.4.11
## [46] genefilter_1.72.1 systemfonts_1.0.1 scatterplot3d_0.3-41
## [49] splines_4.0.3 rstatix_0.7.0 lazyeval_0.2.2
## [52] shinyBS_0.61 broom_0.7.5 abind_1.4-5
## [55] BiocManager_1.30.10 yaml_2.2.1 reshape2_1.4.4
## [58] modelr_0.1.8 backports_1.2.1 httpuv_1.6.1
## [61] tools_4.0.3 ellipsis_0.3.2 kableExtra_1.3.4
## [64] jquerylib_0.1.4 RColorBrewer_1.1-2 Rcpp_1.0.6
## [67] plyr_1.8.6 zlibbioc_1.36.0 RCurl_1.98-1.3
## [70] ggpubr_0.4.0 viridis_0.6.1 haven_2.3.1
## [73] ggrepel_0.9.1 fs_1.5.0 magrittr_2.0.1
## [76] futile.options_1.0.1 magick_2.7.2 openxlsx_4.2.3
## [79] reprex_1.0.0 hms_1.1.0 shinyjs_2.0.0
## [82] mime_0.10 evaluate_0.14 xtable_1.8-4
## [85] XML_3.99-0.6 VennDiagram_1.6.20 rio_0.5.26
## [88] mclust_5.4.7 readxl_1.3.1 gridExtra_2.3
## [91] compiler_4.0.3 crayon_1.4.1 htmltools_0.5.1.1
## [94] mgcv_1.8-34 later_1.2.0 geneplotter_1.68.0
## [97] lubridate_1.7.10 DBI_1.1.1 formatR_1.9
## [100] dbplyr_2.1.1 MASS_7.3-53.1 relimp_1.0-5
## [103] BiocStyle_2.18.1 Matrix_1.3-4 car_3.0-10
## [106] cli_2.5.0 pkgconfig_2.0.3 flashClust_1.01-2
## [109] registry_0.5-1 foreign_0.8-81 plotly_4.9.3
## [112] xml2_1.3.2 foreach_1.5.1 svglite_2.0.0
## [115] annotate_1.68.0 bslib_0.2.5.1 webshot_0.5.2
## [118] XVector_0.30.0 rvest_1.0.0 digest_0.6.27
## [121] rmarkdown_2.8 cellranger_1.1.0 dendextend_1.14.0
## [124] curl_4.3.1 shiny_1.6.0 nlme_3.1-152
## [127] lifecycle_1.0.0 jsonlite_1.7.2 carData_3.0-4
## [130] futile.logger_1.4.3 qvcalc_1.0.2 viridisLite_0.4.0
## [133] fansi_0.5.0 pillar_1.6.1 lattice_0.20-41
## [136] fastmap_1.1.0 httr_1.4.2 survival_3.2-10
## [139] glue_1.4.2 zip_2.1.1 UpSetR_1.4.0
## [142] iterators_1.0.13 bit_4.0.4 stringi_1.6.2
## [145] sass_0.4.0 blob_1.2.1 memoise_2.0.0